Nextflow is a domain specific language (DSL) implemented on top of the Groovy programming language, which in turn is a super-set of the Java programming language. This means that Nextflow can run any Groovy or Java code.
To print something is as easy as using one of the print or println methods.
println("Hello, World!")
Hello, World!
To define a variable, simply assign a value to it:
x = 1
println x
1
x = new java.util.Date()
println x
Mon Dec 11 16:54:36 IST 2023
x = -3.1499392
println x
-3.1499392
x = false
println x
false
x = "Hi"
println x
Hi
Local variables are defined using the def keyword:
def x = 'foo'
println x
foo
The def should be always used when defining variables local to a function or a closure.
A List object can be defined by placing the list items in square brackets:
list = [10, 20, 30, 40]
The items inside a list can be accessed using their index, list indexing begins at [0].
list = [10, 20, 30, 40]
println list[0]
println list.get(0)
10
10
The size method gives the length of a list.
list = [10, 20, 30, 40]
println list.size()
4
The assert keyword is to test if a condition is true (similar to an if function). Here, Groovy will print nothing if it is correct, else it will raise an AssertionError message.
list = [10, 20, 30, 40]
assert list[0] == 10
Lists can also be indexed with negative indexes and reversed ranges.
list = [0, 1, 2]
assert list[-1] == 2
assert list[-1..0] == list.reverse()
Info:
In the last assert line we are referencing the initial list and converting this with a “shorthand” range (..), to run from the -1th element (2) to the 0th element (0).
Maps are like lists that have an arbitrary key instead of an integer. Therefore, the syntax is very much aligned.
map = [a: 0, b: 1, c: 2]
Maps can be accessed in a conventional square-bracket syntax or as if the key was a property of the map.
map = [a: 0, b: 1, c: 2]
assert map['a'] == 0
assert map.b == 1
assert map.get('c') == 2
To add data or to modify a map, the syntax is similar to adding values to a list:
map = [a: 0, b: 1, c: 2]
map['a'] = 'x'
map.b = 'y'
map.put('c', 'z')
assert map == [a: 'x', b: 'y', c: 'z']
String literals can be defined by enclosing them with either single- ('') or double- ("") quotation marks.
foxtype = 'quick'
foxcolor = ['b', 'r', 'o', 'w', 'n']
println "The $foxtype ${foxcolor.join()} fox"
x = 'Hello'
println '$x + $y'
The quick brown fox
$x + $y
Info:
Note the different use of$and${..}syntax to interpolate value expressions in a string literal. The$xvariable was not expanded, as it was enclosed by single quotes.
Finally, string literals can also be defined using the / character as a delimiter. They are known as slashy strings and are useful for defining regular expressions and patterns, as there is no need to escape backslashes. As with double-quote strings they allow to interpolate variables prefixed with a $ character.
Try the following to see the difference:
x = /tic\tac\toe/
y = 'tic\tac\toe'
println x
println y
tic\tac\toe
tic ac oe
The if statement uses the same syntax common in other programming languages, such as Java, C, JavaScript, etc.
if (< boolean expression >) {
// true branch
}
else {
// false branch
}
The else branch is optional. Also, the curly brackets are optional when the branch defines just a single statement.
x = 1
if (x > 10)
println 'Hello'
null, empty strings, and empty collections are evaluated to false.
Therefore a statement like:
list = [1, 2, 3]
if (list != null && list.size() > 0) {
println list
}
else {
println 'The list is empty'
}
[1, 2, 3]
Can be written as:
list = [1, 2, 3]
if (list)
println list
else
println 'The list is empty'
[1, 2, 3]
The classical for loop syntax is supported as shown here:
for (int i = 0; i < 3; i++) {
println("Hello World $i")
}
Hello World 0
Hello World 1
Hello World 2
Iteration over list objects is also possible using the syntax below:
list = ['a', 'b', 'c']
for (String elem : list) {
println elem
}
a
b
c
In Groovy, the user defined function is called a closure. A closure is a block of code that can be passed as an argument to a function. Thus, you can define a chunk of code and then pass it around as if it were a string or an integer.
square = { it * it }
The curly brackets around the expression it * it tells the script interpreter to treat this expression as code. The it identifier is an implicit variable that represents the value that is passed to the function when it is invoked.
Once compiled the function object is assigned to the variable square as any other variable assignments shown previously. Now we can do something like this:
square = { it * it }
println square(9)
81
Channels are a key data structure of Nextflow that allows the implementation of reactive-functional oriented computational workflows based on the Dataflow programming paradigm.
They are used to logically connect tasks to each other or to implement functional style data transformations.
Nextflow distinguishes two different kinds of channels: queue channels and value channels.
A queue channel is an asynchronous unidirectional FIFO queue that connects two processes or operators.
asynchronous means that operations are non-blocking.
unidirectional means that data flows from a producer to a consumer.
FIFO means that the data is guaranteed to be delivered in the same order as it is produced. First In, First Out.
A queue channel is implicitly created by process output definitions or using channel factories such as Channel.of or Channel.fromPath.
A value channel (a.k.a. singleton channel) by definition is bound to a single value and it can be read unlimited times without consuming its contents. A value channel is created using the value channel factory or by operators returning a single value, such as first, last, collect, count, min, max, reduce, and sum.
These are Nextflow commands for creating channels that have implicit expected inputs and functions.
value()The value channel factory is used to create a value channel. An optional not null argument can be specified to bind the channel to a specific value. For example:
ch1 = Channel.value()
ch2 = Channel.value('Hello there')
ch3 = Channel.value([1, 2, 3, 4, 5])
of()The factory Channel.of allows the creation of a queue channel with the values specified as arguments.
ch = Channel.of(1, 3, 5, 7)
ch.view { "value: $it" }
The first line in this example creates a variable ch which holds a channel object. This channel emits the values specified as a parameter in the of channel factory. Thus the second line will print the following:
value: 1
value: 3
value: 5
value: 7
The Channel.of channel factory works in a similar manner to Channel.from (which is now deprecated), fixing some inconsistent behaviors of the latter and providing better handling when specifying a range of values.
fromList()The Channel.fromList channel factory creates a channel emitting the elements provided by a list object specified as an argument:
list = ['hello', 'world']
Channel
.fromList(list)
fromPath()The fromPath channel factory creates a queue channel emitting one or more files matching the specified glob pattern.
Channel.fromPath('./data/meta/*.csv')
This example creates a channel and emits as many items as there are files with a csv extension in the ./data/meta folder. Each element is a file object implementing the Path interface.
Tip:
Two asterisks, i.e.**, works like*but cross directory boundaries. This syntax is generally used for matching complete paths. Curly brackets specify a collection of sub-patterns.
fromFilePairs()The fromFilePairs channel factory creates a channel emitting the file pairs matching a glob pattern provided by the user. The matching files are emitted as tuples, in which the first element is the grouping key of the matching pair and the second element is the list of files (sorted in lexicographical order).
#!/usr/bin/env nextflow
// Channel with explicit values
ch = Channel.of(1, 3, 5, 7)
ch.view { "value: $it" }
// Channel from a list
list = ['hello', 'world']
Channel.fromList(list).view()
// Channel from a text file
Channel.fromPath('./bin/text_input.txt').splitText().view()
// Channel from file pairs matching a pattern
Channel.fromFilePairs('./data/reads/*_{1,2}.fastq.gz').view()
nextflow run bin/example_channels.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/example_channels.nf` [sharp_swartz] DSL2 - revision: b19d201e37
value: 1
value: 3
value: 5
value: 7
hello
world
ENSG00000157764
NM_001301717
chr17:43044295-43170245
RS123456
gene123
Operators are methods that allow you to connect, transform values, or apply some user-provided rules.
view()The view operator prints the items emitted by a channel to the console standard output, appending a new line character to each item. For example:
cat bin/view_operator.nf
Channel
.of('foo', 'bar', 'baz')
.view()
nextflow run bin/view_operator.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/view_operator.nf` [pedantic_pesquet] DSL2 - revision: 969a56608b
foo
bar
baz
map()The map operator applies a function of your choosing to every item emitted by a channel and returns the items obtained as a new channel. The function applied is called the mapping function and is expressed with a closure as shown in the example below:
cat bin/map_operator1.nf
Channel
.of('hello', 'world')
.map { it -> it.reverse() }
.view()
nextflow run bin/map_operator1.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/map_operator1.nf` [focused_mercator] DSL2 - revision: 991adec947
olleh
dlrow
A map() can associate a generic tuple to each element and can contain any data.
cat bin/map_operator2.nf
Channel
.of('hello', 'world')
.map { word -> [word, word.size()] }
.view { word, len -> "$word contains $len letters" }
nextflow run bin/map_operator2.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/map_operator2.nf` [naughty_mercator] DSL2 - revision: 5d9f685614
hello contains 5 letters
world contains 5 letters
mix()The mix operator combines the items emitted by two (or more) channels into a single channel.
cat bin/mix_operator.nf
my_channel_1 = Channel.of(1, 2, 3)
my_channel_2 = Channel.of('a', 'b')
my_channel_3 = Channel.of('z')
my_channel_1
.mix(my_channel_2, my_channel_3)
.view()
nextflow run bin/mix_operator.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/mix_operator.nf` [chaotic_goldberg] DSL2 - revision: d9ca7f7409
z
1
2
3
a
b
join()The join operator creates a channel that joins together the items emitted by two channels with a matching key. The key is defined, by default, as the first element in each item emitted.
cat bin/join_operator.nf
left = Channel.of(['X', 1], ['Y', 2], ['Z', 3], ['P', 7])
right = Channel.of(['Z', 6], ['Y', 5], ['X', 4])
left.join(right).view()
nextflow run bin/join_operator.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/join_operator.nf` [amazing_stonebraker] DSL2 - revision: 5951346a4b
[Z, 3, 6]
[Y, 2, 5]
[X, 1, 4]
combine()The combine operator combines (cartesian product) the items emitted by two channels or by a channel and a Collection object (as right operand). COmbine returns a queue channel. For example:
cat bin/combine_operator1.nf
numbers = Channel.of(1, 2, 3)
words = Channel.of('hello', 'ciao')
numbers
.combine(words)
.view()
nextflow run bin/combine_operator1.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/combine_operator1.nf` [nostalgic_kimura] DSL2 - revision: aa84e750a1
[1, hello]
[2, hello]
[3, hello]
[1, ciao]
[2, ciao]
[3, ciao]
A second version of the combine operator allows you to combine items that share a common matching key. The index of the key element is specified by using the by parameter (zero-based index, multiple indices can be specified as a list of integers). For example:
cat bin/combine_operator2.nf
left = Channel.of(['A', 1], ['B', 2], ['A', 3])
right = Channel.of(['B', 'x'], ['B', 'y'], ['A', 'z'], ['A', 'w'])
left
.combine(right, by: 0)
.view()
nextflow run bin/combine_operator2.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/combine_operator2.nf` [reverent_shockley] DSL2 - revision: f0fa879250
[B, 2, x]
[B, 2, y]
[A, 1, z]
[A, 3, z]
[A, 1, w]
[A, 3, w]
concat()The concat operator allows you to concatenate the items emitted by two or more channels to a new channel. The items emitted by the resulting channel are in the same order as specified in the operator arguments.
In other words, given N channels, the items from the i+1 th channel are emitted only after all of the items from the i th channel have been emitted.
For example:
cat bin/concat_operator.nf
a = Channel.of('a', 'b', 'c')
b = Channel.of(1, 2, 3)
c = Channel.of('p', 'q')
c.concat( b, a ).view()
nextflow run bin/concat_operator.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/concat_operator.nf` [sad_gates] DSL2 - revision: aceedaf42d
p
q
1
2
3
a
b
c
count()The count operator creates a channel that emits a single item: a number that represents the total number of items emitted by the source channel. For example:
cat bin/count_operator.nf
Channel
.of(9,1,7,5)
.count()
.view()
nextflow run bin/count_operator.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/count_operator.nf` [sleepy_wilson] DSL2 - revision: d3408f2eb5
4
ifEmpty()The ifEmpty operator creates a channel which emits a default value, specified as the operator parameter, when the channel to which is applied is empty i.e. doesn’t emit any value. Otherwise it will emit the same sequence of entries as the original channel.
Thus, the following example prints:
cat bin/ifempty_operator.nf
Channel .of(1,2,3) .ifEmpty('Hello') .view()
nextflow run bin/ifempty_operator.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/ifempty_operator.nf` [maniac_murdock] DSL2 - revision: 188c99253a
1
2
3
toSortedList()The toSortedList operator collects all the items emitted by a channel to a List object where they are sorted and emits the resulting collection as a single item. For example:
cat bin/tosortedlist_operator.nf
Channel
.of( 3, 2, 1, 4 )
.toSortedList()
.subscribe onNext: { println it }, onComplete: { println 'Done' }
nextflow run bin/tosortedlist_operator.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/tosortedlist_operator.nf` [prickly_sax] DSL2 - revision: 25e810eb00
[1, 2, 3, 4]
Done
unique()The unique operator allows you to remove duplicate items from a channel and only emit single items with no repetition.
For example:
cat bin/unique_operator.nf
Channel
.of( 1, 1, 1, 5, 7, 7, 7, 3, 3 )
.unique()
.view()
nextflow run bin/unique_operator.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/unique_operator.nf` [drunk_gauss] DSL2 - revision: 5722dca47a
1
5
7
3
take()The take operator allows you to filter only the first n items emitted by a channel. For example:
cat bin/take_operator.nf
Channel
.of( 1, 2, 3, 4, 5, 6 )
.take( 3 )
.view()
nextflow run bin/take_operator.nf
N E X T F L O W ~ version 23.04.1
Launching `bin/take_operator.nf` [maniac_engelbart] DSL2 - revision: dd5943541c
1
2
3
Use a Groovy closure to find the maximum value in a list of numbers.
Create a function to count the occurrences of a specific word in a text.
Read the fastq files present in **data/reads/** into a channel, also parse and add metadata in the channel.